Scalable Linear Causal Inference for Irregularly Sampled Time Series with Long Range Dependencies
نویسندگان
چکیده
Linear causal analysis is central to a wide range of important application spanning finance, the physical sciences, and engineering. Much of the existing literature in linear causal analysis operates in the time domain. Unfortunately, the direct application of time domain linear causal analysis to many real-world time series presents three critical challenges: irregular temporal sampling, long range dependencies, and scale. Moreover, real-world data is often collected at irregular time intervals across vast arrays of decentralized sensors and with long range dependencies [1] which make naive time domain correlation estimators spurious [2]. In this paper we present a frequency domain based estimation framework which naturally handles irregularly sampled data and long range dependencies while enabled memory and communication efficient distributed processing of time series data. By operating in the frequency domain we eliminate the need to interpolate and help mitigate the effects of long range dependencies. We implement and evaluate our new work-flow in the distributed setting using Apache Spark and demonstrate on both Monte Carlo simulations and high-frequency financial trading that we can accurately recover causal structure at scale.
منابع مشابه
Wavelet analysis of GRACE K-band range rate measurements related to Urmia Basin
Space-borne gravity data from Gravity Recovery and Climate Experiment (GRACE), as well as some other in situ and remotely sensed satellite data have been used to determine water storage changes in Lake Urmia Basin (Iran). As usual, the GRACE products are derived from precise inter-satellite range rate measurements converted to different formats such as spherical harmonic coefficients and equiva...
متن کاملModeling Clinical Time Series Using Gaussian Process Sequences
Development of accurate models of complex clinical time series data is critical for understanding the disease, its dynamics, and subsequently patient management and clinical decision making. Clinical time series differ from other time series applications mainly in that observations are often missing and made at irregular time intervals. In this work, we propose and test a new probabilistic appr...
متن کاملOn Causality Inference in Time Series
Causality discovery has been one of the core tasks in scientific research since the beginning of human scientific history. In the age of data tsunami, the task could involve millions of variables, which cannot be achieved feasibly by human. However, the causal discovery using artificial intelligence and statistical techniques in non-experimental settings faces several challenges. In this work, ...
متن کاملProbabilistic broken-stick model: A regression algorithm for irregularly sampled data with application to eGFR
In order for clinicians to manage disease progression and make effective decisions about drug dosage, treatment regimens or scheduling follow up appointments, it is necessary to be able to identify both short and long-term trends in repeated biomedical measurements. However, this is complicated by the fact that these measurements are irregularly sampled and influenced by both genuine physiologi...
متن کاملA scalable end-to-end Gaussian process adapter for irregularly sampled time series classification
We present a general framework for classification of sparse and irregularly-sampled time series. The properties of such time series can result in substantial uncertainty about the values of the underlying temporal processes, while making the data difficult to deal with using standard classification methods that assume fixeddimensional feature spaces. To address these challenges, we propose an u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1603.03336 شماره
صفحات -
تاریخ انتشار 2016